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Abstract Gamma Ray Array Detector (GRAD) is one of external target facility subsystems in the Cooling Storage 
Ring on the Heavy Ion Research Facility at Lanzhou (HIRFL-CSR). The trigger subsystem of GRAD is required to 


make a fast L1 trigger decision with a fixed latency for the data acquisition. Because the hit signals from the detector 


are asynchronous with the local clock of the trigger system, a nondeterministic latency (the value changes between 


zero and one clock period) is generated when the synchronous receivers of the conventional trigger system process the 


hit signals. In this paper, an improved trigger logic based on a field-programmable gate array is developed, and 


comprised of zero-delay broadening circuits as receivers and an improved adding circuit designed for the new 


receivers. Software simulation and experimental measurement have been conducted. Comparison with the 


conventional trigger logic, the improved trigger logic has the advantage of eliminating the nondeterministic latency 


and reducing the total processing latency. 
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1 Introduction 


A reliable trigger system is of significant importance 
to large detector electronics in particle physics. A 
universal architecture of the L1 trigger system is 
simplified from several particle experiment electronics 
(Fig.1), such as BESIIT'!) ATLAS”, and ALICE. 
The basic logic of making a L1 trigger decision in the 
field-programmable gate array (FPGA) contains three 
parts: receiving module, adding module, and a 
comparator comparing the hit-number with the trigger 
condition. As a collision event occurs, multichannel 
analog signals from the detector are converted to 
digital hit signals by the threshold discriminator in the 
front-end electronics (FEE), and delivered to the L1 
trigger system. The receiving module of the L1 trigger 
system will synchronize and align the hit signals. The 
adding module will count up the total number of the 
responding channels (hit-number) in this event. The 
comparator will compare the acquired hit-number with 


the trigger condition to generate a L1 trigger decision. 
That is the kernel process of the L1 trigger system. 
Simultaneously, the trigger information including 
position information and other useful information will 
be delivered to the L2 trigger system for further 
processing. These multilevel trigger decisions will 
select the valid event together for the data acquisition 
(DAQ) system. 


Multichannel L1 trigger logic in FPGA 
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Fig.1 A universal architecture of the multichannel L1 trigger 
system. 

There are two main parts of the receiving and 
the adding latencies for the L1 trigger latency. Due to 


the nondeterministic time of the random collision 
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occurrence, the arrival time of each hit signal is 
asynchronous with the local clock of the L1 trigger 
system. The conventional trigger receiving module 
employs the registers to synchronize the hit signals by 
latching them on the clock edge (e.g., a FIFO with 
41.67-MHz local clock is employed as receiving 
module in the BESII TOF trigger subsystem"), The 
hit signals are latched on the clock edge and the 
receiving latency is generated. The receiving latency is 
equal to the time difference between the arrival time 
and the clock edge. The receiving latency is a 
nondeterministic latency because its value changes 
randomly between zero and one local clock period in 
different events. An average value of nondeterministic 
latency, which is half a clock period, can be reduced a 
little by increasing clock frequency. But higher clock 
frequency will increase the logic designing difficulty 
of satisfying the timing slack, and the highest clock 
frequency is limited by the FPGA hardware. 

The GRAD electronics in the HIRFL-CSR"™ is 
composed of three parts of a FEE based on ASIC chips 
named MATE! a DAQ subsystem, and a 64-channel 
L1 trigger subsystem. Due to the mechanism of track- 
hold processing in the MATEs (sampling should be 
started at a specific time for accuracy measurement), 
the DAQ subsystem requires a fast L1 trigger decision, 
and the latency of the trigger decision should be fixed 
for accurate off-line calibration. So the synchronous 
receivers of the conventional trigger logic are 
incapable of satisfying the requirement due to the 
nondeterministic receiving latency. Eliminating the 
nondeterministic latency, the zero-delay broadening 
circuits are developed as receivers, and an improved 
adding module is designed for matching the new 
receivers. The design and performance of improved L1 
trigger logic would be described below. 


2 Improved Li trigger logic 


Figure 2 shows the improved L1 trigger logic 
implemented in the FPGA. The kernel trigger logic 
consists of the zero-delay broadening circuits as 
receivers and the improved adding circuit. The new 
receivers are able to catch and broaden the 
multichannel hit signals without the nondeterministic 
latency. The improved adding circuit by adding all hit 


signals together will acquire the hit-number (one 


channel signal as a 1-bit data). The hit-number will be 
compared with the trigger condition, and a fast L1 
trigger decision will be generated to make the DAQ 
subsystem readout first. The latency of the L1 trigger 
decision is low and fixed because there is no 
synchronization in the receiving and adding processes. 

Simultaneously, the conventional synchronous 
trigger logic is employed to acquire trigger 
information for the L2 trigger system. The slow L2 
trigger decision will decide whether the event data 


acquired by the DAQ system should be stored. 
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Fig.2 The architecture of the improved L1 trigger logic. 
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2.1 Zero-delay broadening circuits 


On striking the corresponding scintillator for each 
signal channel, the arrival time of the charged particles, 
has difference due to the hit position difference 4] So 
the signals of hit channels are delivered to the trigger 
system at different time. A kernel channel decides the 
trigger latency in one event (e.g., if the trigger 
threshold of hit-number is 5 and the hit-number is 10, 
the 5th received hit signal is kernel. When the latency 
of receiving the 5th hit signal is contributed to the 
trigger latency, the latency of the 4th signals and the 
last 5 signals are not worth concerning). On the other 
hand, a valid arrival time (fmax) stands for the 
maximum time difference between all channel paths 
(e.g., fmax is 20 ns in the GRAD, meaning that only 
from the first hit signal reaching the trigger system to 
20 ns later, all hit signals received belong to the same 
event). Due to the difference of the arrival time and 
pulse width of hit signals, the first operation of 
receivers should be aligning all the hit signals. Instead 
of registers of the conventional synchronous receiver, 
the zero-delay broadening circuits are employed to 
complete the aligning operation. The signal of each hit 
channel will be broadened to a width of more than tax, 
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so there will be a period when all hit signals up to the 
adding circuit maintain a high voltage level. The 
combined adding circuit can count the hit-number in 
this period. That is asynchronous aligning. 

Figure 3 shows a zero-delay broadening circuit 
constituted of D type flip-flops (DFF). To meet the 
requirement that broadening width be more than tax, 
the minimum number of DFFs used in each channel is 

N=tmax/tet2 (1) 


where, taxis the one clock period. When a pulse of hit 
signal is delivered to the clock pin of the first DFF, the 
output signal of Q pin will turn to high level 
immediately due to the link of high level to the input 
D pin. Simultaneously, the following DFFs will 
receive and transmit the high level signal from the first 
DFF in sequence. Synchronously with the local clock, 
each following DFF will generate a latency of one 
clock cycle (the latency of the second DFF is less than 
a clock period because a synchronizing operation 
should be conducted). After a total latency of more 
than tmax, the clear pin of the first DFF will receive a 
feedback signal of high level from the Q pin of the last 
DFF. The first D flip-flop will be reset and the output 
signal will turn to low level until the arrival of next hit 
signal. As a result, the hit signal is broadened to a 
width of more than fmax with zero-delay, whatever the 
input signal changes in this process. 


Broadening 
signal output 


Fig.3 Single channel zero-delay broadening circuit. 

Figure 4 shows the timing simulation of the 
synchronous and the zero-delay receivers. The 
synchronous receiver with several registers can 
synchronize and store the hit signal for a few clock 
periods. Two things about the synchronous receiver 
should be concerned. Firstly, the latency of channel 1 
does not equal that of channel 3. Secondly, the signal 
of channel 2 (the pulse width is less than a clock cycle) 
is missed catching. Compared with the latency of the 


synchronous receivers, the receiving latency of the 


zero-delay receiver is fixed and lower. The latency, 
which is only composed of pin-to-pin delay and gate 
delay, depends on the speed grade of the FPGA 
without nondeterministic latency in the receiving 
process. Furthermore, the zero-delay receiver is 
capable of catching the signal of short pulse width, 
though the false triggering may be caused if the short 
pulse was a noise. In fact, this case does not appear 
due to the reliable FEE designing. In addition, the time 
of broadening width is a dead time of the improved 
trigger logic (any new hit signal is ignored in this 
period). So the broadening width of the zero-delay 
receivers is just satisfying the physical requirement 
when fmax iS 20 ns, as shown in Fig. 4. 
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Fig.4 Timing simulations of the synchronous receivers and 
the zero-delay receivers. 


2.2 Improved adding circuit 


When the adding circuit receives the 64 broadened 
signals as 64 1-bit data (each 1-bit data will be “1” 
when a hit occurs on the correspond scintillator), there 
is the 63 adding operations in the process of 
calculating the hit-number, and the adding circuit 
consists of 63 adders. The adding circuit is a 
combination to avoid new  nondeterministic 
synchronization latency. So the two things of the non- 
deterministic latency and the race hazard should be 
concerned. Fig.5 shows two kinds of adding circuits of 
the serial and the parallel adding circuits. The former 
is default in the FPGA if the logic is compiled without 
special designing, and the latter as the improved 


adding logic is superior to the former. 


First, the serial adding circuit generates a new 
nondeterministic latency. As shown in Fig.5(a), the 
input channels have different paths and length (the 
signal of channel 1 passes 63 adders, but the signal of 
channel 64 only passes 1 adder). Because the kernel 
channel changes randomly in different events, the 
actual processing delay (only depending on the kernel 
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channel) changes with a nondeterministic latency. In 
Fig.5(b), the parallel circuit has no such problem 
because every input signal passes the 7 adders. 
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Fig.5 The 64-channel adding circuit. (a)Serial adders, (b) 
parallel adders. 


Figure 6(a) shows the latency of different 
channels (there are the input signals for channels of 1, 
12, 24, 36, 48 and 56 from left to right). The latency of 
the serial circuit changes with different input channels. 
The latency of the parallel circuit is always fixed. 

Second, the serial circuit generates more 
glitches. Because of the asymmetric paths of the 64 
channels, there are more junctions when the serial 
circuit is compiled to look-up tables (LUTs) in the 
FPGA. If a signal passing a junction goes through two 
paths of different length, and reaches two LUTs or two 
inputs of one LUT, there will be a race hazard, thus 
causing a glitch. Because all channel paths of the 
parallel circuit are the same, the fewer glitches are 
caused by the parallel circuit. Fig.6(b) shows the 
glitches of the two adding circuits at an extreme state. 
The glitches of the parallel circuit are fewer and 
shorter than that of the serial circuit. In fact, because 
the hit-number in one event is smaller than 10 in the 
gamma-ray energy measuring experiment, the glitches 
are fewer than that of the simulation. 
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Fig.6 Timing simulations of the serial adding circuit and the parallel adding circuit. (a) The latency, (b) the glitches. 


A grounder capacitor linking to the output pin 
of the trigger decision is used to eliminate the glitches. 
The voltage of the charging capacitor is 


Vi=Ex[1-exp(-t/RC)] (2) 


where, E is the output voltage of the FPGA pin, and R 
is its output impedance, which is about 10 Q here. C is 
the capacitance and ¢ is the charging time. If RC is 
greater than ¢, the capacitor voltage will be less than 
0.63xE. So the glitch whose pulse width is less than ¢ 
will be eliminated in the LVTTL signal (the threshold 
of the high level is more than 0.63x£). For the 
improved trigger logic, we choose a capacitor of 10 pf 
as the grounded capacitor. The capacitor is capable of 


eliminating the glitches width of less than 1 ns. The 
capacitor capacity is enough for the parallel adding 
circuit. Also, the capacitor will delay the output signal 
of the trigger decision, and the delay value equals RC. 
So the circuit with more glitches needs a greater 
capacitor, thus generating more trigger latency. As a 
result, the improved adding circuit of parallel adders 
has the advantage of the lower and fixed latency 
comparing with the serial adding circuit. 


3 Experiment measurement 


To quantify the latency difference between the 


improved asynchronous and the conventional 


synchronous trigger logics, a test system has been 
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assembled. The “Co source and CsI scintillator of the 
GRAD generate hit signals, and the FEE modules 
convert the analog hit signals to digital hit signals. The 
64-channel L1 trigger module receives and processes 


the hit signals in the test logic implemented in a FPGA. 


The test logic contains the conventional trigger logic 
with synchronous receivers and the improved trigger 
logic with zero-delay receivers. The adding circuits of 
the two logics are the parallel adders, reducing the 
influence of other differences. The two logics receive 
the hit signals at the same time and generate L1 trigger 
decisions. The trigger decisions are delivered to three 
counters to measure the latency difference, trigger 
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Fig.7 The architecture of the test system. (a) Photograph, (b) logic. 


Table 1 shows the latency difference between 
the improved and the conventional trigger logics at 
different clock frequencies. Theoretically, the latency 
difference is equal to half a clock period on average, 
but following reasons can cause the error. First, the 
clock frequency of the timer is only 250 MHz (the 
FPGA limits the highest clock frequency) and the 
measuring precision is 4 ns. Second, because the timer 
and the trigger clocks are generated by the same PLL 
and clock source, there is a delay of timer clock period 
when the timer catches the synchronous trigger 


frequency and false triggering. A timer of latency, 
working with a 250-MHz clock supplied by a phase 
locking loop (PLL), will start counting at receiving the 
improved trigger decision, and stop counting at 
receiving the conventional trigger decision. So the 
timer result is the latency difference between the two 
logics. Every 0.25 second, the data of the three 
counters are packed and stored in a FIFO, and the 
three counters are reset. When the FIFO is close to full, 
the data will be transmitted to the host computer via 
the peripheral component interconnect (PCI) bus. The 
software will calculate the average latency difference 
of one event, as shown in Fig.7. 
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decision (e.g., there is always a 4-ns delay at the 10- 
MHz trigger clock when the timer stops counting). 
Though some errors are caused by the limited 
measuring precision, the result shows that the 
improved logic reduces the latency by about half a 
clock cycle. The reduced latency accords with the 
nondeterministic average value, and changes randomly 
between zero and one clock cycle. The difference 
between two logics is their receivers in the test. The 
nondeterministic latency is eliminated by the improved 
trigger logic. 


Table 1 The reduced latency at different clock frequency 

Clock frequency/MHz Latency difference/ns_ Halfa period/ns Error/ns Trigger frequency/Hz False triggering % 
10 54.3 50 4.3 417 0 

20 29.8 25 4.8 418 0 

40 12.3 12.5 —0.2 416 0 

60 8.4 8.33 —0.07 417 0 

80 6.2 6.25 —0.05 418 0 

100 4.9 5 —0.1 418 0 
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4 Conclusions 


The improved L1 trigger logic with zero-delay 


broadening circuits as receivers is capable of 
eliminating the nondeterministic latency and reducing 
the total latency by half a clock period. The improved 
L1 trigger logic has been implemented in the GRAD 
trigger subsystem with a reliable performance. It is 
suitable for other L1 trigger systems required to make 


the L1 trigger decision with a fixed and low latency. 
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